-
Notifications
You must be signed in to change notification settings - Fork 40
only use nightly pytorch in ci #243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
538ea8a
to
dfe09bf
Compare
Summary: use http transport instead of pg transport -- pg transport fails to resolve address when running locally
8017105
to
bfe221f
Compare
torchft/manager.py
Outdated
@@ -382,7 +382,7 @@ def allreduce(self, tensor: torch.Tensor, should_quantize: bool = False) -> Work | |||
) | |||
else: | |||
work = self._pg.allreduce([tensor], ReduceOp.SUM) | |||
work.wait() | |||
work.block_current_stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this partially solves it but it doesn't really help the case below with the tensor division
Ideally we wrap the future below in a Work object and then call .block_current_stream() on that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this partially solves it but it doesn't really help the case below with the tensor division
in which case?
- for nccl with cuda, the behavior should be the same as the existing one
- for gloo with cuda, the tensor is on the gpu (after the host to device copy but the tensor arg is also on gpu) so the callback will also return immediately? iiuc the fut.wait() in the callback that i added will also return immediately
- for gloo without cuda, based on what you said the callback will be called after the device to host copy has been completed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we wrap the future below in a Work object and then call .block_current_stream() on that
we also need to call work.wait()
or work.block_current_stream()
to make sure work finishes on the current stream first before the future runs on the current stream
5507e47
to
61e177c
Compare
4698c70
to
d6b54f7
Compare
cb39d98
to
685e3c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We should document in README that we require torch nightly
d650c7a
to
09208a0
Compare
Summary: - call future.wait in callbacks to make sure the continuation executes after the future has completed - set the stream correctly to execute callback scheduled by bucketized allreduce
Summary: returns the work object so we can be more flexible with the usage
Summary: - change ci to only use nightly since block_current_stream is not in stable yet - fix new errors in nightly version of pyre - remove fixme[29] about future not being a function - make reduce_scatter_quantized return Work object
Summary:
Stack created with Sapling. Best reviewed with ReviewStack.